NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model

Wang, Zheng; Wang, Yuke; Feng, Boyuan; Huang, Guyue; Mudigere, Dheevatsa; Muthiah, Bharath; Li, Ang; Ding, Yufei (July 2024, USENIX Association)

The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained manner, resulting in unbalanced workload distribution and inter-GPU communication. To this end, we propose OPER, an algorithm-system co-design with OPtimality-guided Embedding table parallelization for large-scale Recommendation model training and inference. The core idea of OPER is to explore the connection between DLRM inputs and the efficiency of distributed EMTs, aiming to provide a near-optimal parallelization strategy for EMTs. Specifically, we conduct an in-depth analysis of various types of EMTs parallelism and propose a heuristic search algorithm to efficiently approximate an empirically near-optimal EMT parallelization. Furthermore, we implement a distributed shared memory-based system, which supports the lightweight but complex computation and communication pattern of fine-grained EMT parallelization, effectively converting theoretical improvements into real speedups. Extensive evaluation shows that OPER achieves 2.3× and 4.0× speedup on average in training and inference, respectively, over state-of-the-art DLRM frameworks.
more » « less
Full Text Available
ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference

Feng, Boyuan; Wang, Zheng; Wang, Yuke; Yang, Shu; Ding, Yufei. (October 2023, ACM)

Zero knowledge Neural Networks draw increasing attention for guaranteeing computation integrity and privacy of neural networks (NNs) based on zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK) security scheme. However, the performance of zkSNARK NNs is far from optimal due to the million-scale circuit computation with heavy scalar-level dependency. In this paper, we propose a type-based optimizing framework for efficient zero-knowledge NN inference, namely ZENO (ZEro knowledge Neural network Optimizer). We first introduce ZENO language construct to maintain high-level semantics and the type information (e.g., privacy and tensor) for allowing more aggressive optimizations. We then propose privacytype driven and tensor-type driven optimizations to further optimize the generated zkSNARK circuit. Finally, we design a set of NN-centric system optimizations to further accelerate zkSNARK NNs. Experimental results show that ZENO achieves up to 8.5× end-to-end speedup than state-of-the-art zkSNARK NNs. We reduce proof time for VGG16 from 6 minutes to 48 seconds, which makes zkSNARK NNs practical.
more » « less
Full Text Available
TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Wang, Yuke; Feng, Boyuan; Wang, Zheng; Huang, Guyue; Ding, Yufei. (July 2023, USENIX Association)

Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose TC-GNN, the first GNN acceleration framework based on GPU Tensor Core Units (TCUs). The core idea is to reconcile the "Sparse" GNN computation with the high-performance "Dense" TCUs. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of the sparse GNN workload. We implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We integrate MGG with the PyTorch framework for high programmability. Rigorous experiments show an average of 1.70× speedup over the state-of-the-art DGL framework across various models and datasets.
more » « less
Full Text Available
On Adversarial Robustness of Point Cloud Semantic Segmentation

https://doi.org/10.1109/DSN58367.2023.00056

Xu, Jiacen; Zhou, Zhe; Feng, Boyuan; Ding, Yufei; Li, Zhou (June 2023, Annual IEEEIFIP International Conference on Dependable Systems and Networks Supplemental Volume DSNS)
{MGG}: Accelerating Graph Neural Networks with {Fine-Grained} {Intra-Kernel} {Communication-Computation} Pipelining on {Multi-GPU} Platforms

Wang, Yuke; Feng, Boyuan; Wang, Zheng; Barker, Kevin; Li, Ang; Ding, Yufei. (July 2023, USENIX Association)

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG , a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41×, 4.81×, and 10.83× faster than DGL, MGG-UVM, and ROC, respectively.
more » « less
Full Text Available
QGTC: accelerating quantized graph neural networks via GPU tensor core

https://doi.org/10.1145/3503221.3508408

Wang, Yuke; Feng, Boyuan; Ding, Yufei (March 2022, Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

Full Text Available
Faith: An Efficient Framework for Transformer Verification on GPUs

Feng, Boyuan; Tang, Tianqi; Wang, Yuke; Chen, Zhaodong; Wang, Zheng; Yang, Shu; Xie, Yuan; Ding, Yufei (July 2022, Proceedings of the 2022 USENIX Annual Technical Conference)

Full Text Available
An Efficient Quantitative Approach for Optimizing Convolutional Neural Networks

Wang, Yuke; Feng, Boyuan; Peng, Xueqiao; Ding, Yufei (November 2021, 30th ACM International Conference on Information and Knowledge Management (CIKM ’21))
null (Ed.)
Full Text Available
APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores

https://doi.org/10.1145/3458817.3476157

Feng, Boyuan; Wang, Yuke; Geng, Tong; Li, Ang; Ding, Yufei (November 2021, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis)
null (Ed.)
Full Text Available
Saga: Sparse Adversarial Attack on EEG-Based Brain Computer Interface

https://doi.org/10.1109/ICASSP39728.2021.9413507

Feng, Boyuan; Wang, Yuke; Ding, Yufei (June 2021, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records